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Electromagnetic (EM) follow-up observations of gravitational wave (GW) events will help shed 
light on the nature of the sources, and more can be learned if the EM follow-ups can start as 
soon as the GW event becomes observable. In this paper, we propose a computationally efficient 
time-domain algorithm capable of detecting gravitational waves (GWs) from coalescing binaries 
of compact objects with nearly zero time delay. In case when the signal is strong enough, our 
algorithm also has the flexibility to trigger EM observation before the merger. The key to the 
efficiency of our algorithm arises from the use of chains of so-called Infinite Impulse Response 
(IIR) filters, which filter time-series data recursively. Computational cost is further reduced by a 
template interpolation technique that requires filtering to be done only for a much coarser template 
bank than otherwise required to sufficiently recover optimal signal-to-noise ratio. Towards future 
detectors with sensitivity extending to lower frequencies, our algorithm's computational cost is shown 
to increase rather insignificantly compared to the conventional time-domain correlation method. 
Moreover, at latencies of less than hundreds to thousands of seconds, this method is expected to be 
computationally more efficient than the straightforward frequency-domain method. 

PACS numbers: 04.80.Nn, 95.75.-z, 97.80.-d, 97.60.Gb 



I. INTRODUCTION 

Coalescences of neutron-star (NS) binaries are primary 
sources for ground-based gravitational- wave detectors. It 
has been estimated that Advanced LIGO may be able to 
detect 10 to 100 such events per year [IJ. The merg- 
ers of neutron star binaries are also possible progeni- 
tors of short hard 7-ray bursts. Although these bursts 
are believed to be mostly beamed away from us, the 
prompt emission and afterglow they induce in X-ray, op- 
tical, infrared and radio frequency bands may well be 
less beamed, and therefore be visible to us 03 . If a 
statistically significant gravitational-wave trigger can be 
obtained before or right after such a coalescence, electro- 
magnetic (especially optical) observatories can then be 
alerted to search for possible prompt and afterglow emis- 
sions — such follow-up observations are likely able to 
resolve whether these mergers are indeed the progenitors 
of short hard 7-ray bursts, and provide further knowledge 
about the nature of these events. 

Currently, neutron star - neutron star coalescence sig- 
nals are being searched for in gravitational-wave data 
using the matched filtering technique [U [5], which cal- 
culates the correlation of data with theoretical templates 
weighted by noise. In order to reduce the computational 
cost, current search pipelines use a frequency-domain 
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method, which gathers a long stretch of time-series data 
containing 0(N) points (the duration of which should 
be longer than the longest possible signal), then uses a 
Fast-Fourier- Transform (FFT) algorithm to search for all 
possible signals that end within this stretch of data, with 
a cost of 0(N log N), as opposed to the 0(N 2 ) required 
by a one-by-one search over merger time. Such a trick, 
although efficient, implies that wc cannot start analyzing 
the data until the collection finishes. 

Unless significant changes from current frequency- 
domain analysis method are made, the latency caused 
by data collection will compromise our ability to obtain a 
trigger with the shortest possible delay after the merger, 
and will totally prevent us from obtaining the trigger 
before the merger. At least two efforts are underway to 
suppress latencies for coalescence signals, the Multi-Band 
Template Analysis (MBTA) [6] and the Low-Latency On- 
line Inspiral Detector (LLOID) [7]. MBTA is a two-band 
frequency-domain search method while LLOID provides 
an infrastructure that accommodates either time or fre- 
quency domain searches. The time-domain aspect of the 
LLOID pipeline based on Finite-Impulse- Response (FIR) 
filters [8] is described in a parallel paper |7]. Note for a 
different search of short gravitational waves of unknown 
waveforms, a program has been set up to analyze avail- 
able detector data in near real-time and seek for optical 
counterpart of candidate events [9]. 

In this paper, we propose a straightforward and effi- 
cient time-domain search algorithm, which allows zero 
and even negative latency (i.e., obtaining trigger before 
the merger if the signal-to-noise ratio (SNR) condition 
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TABLE I: Basic information for the detection of Newtonian 
GW signals by initial, Advanced LIGO and Einstein Tele- 
scope. The columns, from left to right list the names for 
present and future detectors, the minimum frequency of the 
detector, signal duration and number of wave cycles for a 
(1.4+14)M NS-NS binary [see Sec. |IIIA| , as well as the num- 
ber of templates required in order to achieve a match of 0.98 
for binaries with individual mass of 1 - 3 Mq [computed from 



the metric Eq. ( 57 1] 



and other consistency conditions are met) in the most 
natural way. Admittedly, without the savings made avail- 
able by FFT, the computational cost of a straightforward 
implementation using FIR filters can be formidable. In 
the correlation calculation, each template contains a large 
number of wave cycles, and there exists a large number of 
templates — and both these numbers increase dramati- 
cally with the lowering of the minimum frequency cutoff 
/min (Table [i] ) . This poses serious computational chal- 
lenge for detecting GWs from compact object coalescence 
for future GW detectors. 

We propose two techniques that can dramatically 
increase the computational efficiency for time-domain 
searches of GWs from coalescing binaries of compact ob- 
jects in real-time, and make it feasible for future detec- 
tors with frequency cut-offs at as low as / m i n = 3 Hz. 
The first technique uses the well-known Infinite Impulse 
Response, or IIR filters [5], which can be computed with 
much higher efficiency than FIR filters. We propose to 
filter the data using a bank of IIR filters, the sum of 
which approximates each individual binary coalescence 
waveform template. The second technique reduces the 
number of templates by an interpolation technique that 
applies to the proposed IIR filter method. In this ap- 
proach, we first divide the bank of IIR filters associated 
with each template into sub-groups, and then reconstruct 
the filter outputs of a fine template bank by recombin- 
ing the filter outputs from each of these sub-groups with 
appropriate complex coefficients and time delays. This 
is similar to the generic multi-band interpolation scheme 
used in MBTA and LLOID EH EQ . 

Several conventions are used in this paper. The term 
latency refers generally to the delay from the time when a 
signal arrives at the detector to the time the data contain- 
ing the signal actually starts to be analyzed. We specifi- 
cally focus on the delay starting from the time when the 
data are ready to be analyzed. One example of the la- 
tency is the delay due to data accumulation before a Fast 
Fourier Transformation (FFT) can be performed. The 
term real time processing means that data points or data 
segments are processed (with outputs generated) at a rate 
that is equal to their input rate. Floating Point Oper- 



ation is abbreviated as FLOP (plural FLOPs). FLOPS 
and flops are used interchangeably to stand for Floating 
Point Operations per Second. Throughout this paper, we 
follow the convention of counting each real addition and 
real multiplication equally as one FLOP. 

This paper is structured as follows. In Section [TT] , 
we briefly review the basics of matched filtering tech- 
nique and introduce time-domain IIR filters. In Section 



III we use Newtonian-order templates as an example 
to construct IIR filters, characterize the error involved 
and calculate the computational cost for each individ- 
ual template. In Section [IV| we present an interpolation 
technique that allows us to use a significantly decreased 
number of templates for which filter chains must be im- 
plemented. In Section [Vj we make a simple comparison 
between the computational cost of IIR filtering and the 
straightforward frequency-domain algorithm. In Section 
VII we summarize our main conclusions. 



II. MATCHED FILTERING TECHNIQUE 

The optimal technique to extract a signal from noisy 
data when we have reliable theoretical predictions for the 
signal waveform is to use matched filtering [H |S] . The 
output of the matched filtering technique is basically the 
correlation of data with expected waveforms weighted by 
noise. This can be realized in the frequency or time do- 
main. We will give a brief overview of the matched filter- 
ing technique, and introduce its frequency-domain imple- 
mentation and its time-domain approach using the FIR 
and IIR filters. 



A. Frequency-domain implementation 

1. Single Template 

Suppose the output of the interferometer h is a sum of 
noise n and, if exists, a signal s: 



h = 



(1) 



For the moment, let us assume that s is a single known 
waveform. In Eq. ([I]), we have intentionally left out the 
arguments of the functions h, n, and s, which reflects 
the point of view that each of them can be equivalently 
represented both in the time and frequency domain. More 
specifically, we use the following convention for Fourier 
transform, which relates h(t) and h(f) (we shall use tilde 
to emphasize a frequency-domain representation): 



h{f)= / dte^ ft h{t). 



(2) 



The power spectral density of n{t) is denoted by Sh{f), 
which is defined by 
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E[n(f)n*(f)] = -6(f-r)S h (f) 



(3) 
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Here we use one-sided spectral density, E[ ] denotes the 
expectation value over an ensemble of realizations of the 
noise and "*" denotes complex conjugation. Sh(f) — 
Sh(\f\) as the noise in the time domain n(t) is real. 

In order to extract s from h, we perform filtering, which 
consists of taking the inner product between data h and 
template u, forming a filter output of y: 



V = ( h \ u ) = ( s \ u ) + ( n \ u ) 
Here we define inner product as 



(4) 



(a\b) 



2 / df 



= 4Re 



a* (/)&(/) + «(/)&*(/) 
ShU) 

„ S h (f) 



(5) 



In y, we have a signal component (s\u) and a noise com- 
ponent (n\u) which fluctuates around zero. If s has a 
substantially high amplitude and if the template u is ap- 
propriate, the signal component (s\u) in y will raise to 
a high value that merely random fluctuation of (n\u) is 
very unlikely to account for. As a consequence, we can 
impose a threshold on y — an incidence with y higher 
than the threshold is viewed as a detection of a signal. 
The detection efficiency depends on the signal-to-noise 
ratio (SNR) defined generally as 



y(n = 0) - E[y(s = 0)] 



a 



y(s=0) 



(6) 



where a 



y(s=0) 



is the standard deviation of the filter out- 



put when data contain noise only. Assuming zero-mean 
Gaussian noise, we have for Eq. Q 



2. Intrinsic and Extrinsic Parameters 

In reality, templates are not necessarily placed along 
each parameter dimension. The maximization of SNR 
over certain parameters can be conducted analytically 
and therefore no templates are needed. These parame- 
ters are called extrinsic parameters, while those that still 
have to be searched over one by one are called intrinsic 
parameters. 

As an example, for any generic waveform u(t) = 
Auo(t — t c )e l ^ c , where A is a real number, (fr c is the phase 
difference between u(t c ) and Uo(t c ), and t c is its ending 
time. The ending time t c is an extrinsic parameter, be- 
cause as a series of templates Uo(t — t c ) with a variety of 
t c are applied to the data h, the SNR 



p(t c ) -4 Re 



h*(f)Mf) 
S h (\f\) ' 



df 



(11) 



can be computed for all t c via a Fast Fourier Trans- 
form, which cost 0(N log N) FLOPs in the discretized 
case where N is the number of data points in the time 
domain. This is much faster than computing the correla- 
tion for all possible ending times, one by one, which cost 
0(N 2 ) operation counts — and in this way ending time t c 
is converted into an extrinsic parameters. The method of 
Fourier transformation will be discussed in detail in sub- 



section VB This process dominates the computational 
cost for the matched filtering method. Further analyti- 
cal optimization are known for the search of the constant 
phase 4> c . We assume the process is similar for all meth- 
ods discussed in this paper and that its computational 
cost is negligible. 



VE[\(n\uW 



(7) 



Note that the SNR does not depend on the normaliza- 
tion of the template u, and it is conventional to require 
that (u\u) = 1. In this case, the cross-correlation of a 
template with pure noise (n\u) is a random variable with 
zero mean and unity variance. It is easy to show that 
E[(n\a)(n\b)] = (a\b). So we have 

p = (s\u). (8) 

According to the Cauchy-Schwarz inequality, 

p = ^L^)<^{ S \s), (9) 
V( s \ s ) 

where equal sign takes place when u = As where A is 
a constant, and normalization of u gives A = 1/ \J ' (s\s) . 
This means the optimal SNR is given by the modulus 
of the signal, (s|s), and the reduction of SNR due to 
imperfectness of template is given by the match, which 
is also equal to unity minus mismatch, e: 



(s\u) 



y/(u\u)(s\s) 



= 1 



(10) 



B. Time-domain Approach: FIR and IIR method 

For the time-domain filtering we need to obtain a time 
series of SNRs as a function of presumed signal arrival 
time t 



P(t) 



h*(f)u(f) 
S h (\f\) ' 



df 



+ ■00 



with 



;(t) = 2 



w{t')u{t' - t)dt' 



s h (f) 



(12) 



(13) 



which can be thought of as "over- whitened data" ; it is a 
real-valued function of time. Note that in order to gen- 
erate the over- whitened data, we need to convolve h(t) 
with the Inverse Fourier Transform of \/Sh{f), which is a 
time-symmetric, oscillatory function that decays towards 
zero when t is much larger than the inverse of the interfer- 
ometer's bandwidth (> 100 Hz), which is about < 10ms. 
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This means the over-whitening process has an inherent 
latency not much larger than 10ms, which is negligible 
compared to the duration of the signal. 

We now discretize the filtering algorithm. The discrete 
form of Eq ( 12 ) becomes, 



Pk 



WjUj-k 



At 



(14) 



3=-oo 



Here we assume tk = kAt, and that u only have support 
within t < 0. While in principle the waveform could 
have an infinite support in time, — oo < kAt < 0. How- 
ever, the waveform u(t) is always assumed to begin only 
after its amplitude reaches sensitivity within the LIGO 
band. Hence we instead define the waveform to exist on 



the domain —NAt <t<0, and Eq (14) becomes 



Pk 



j=k-N 



UjUj-k 



At, 



(15) 



This summation of the product of data and template 
at each step turns out to be the general form of Finite 
Impulse Response (FIR) filters. The term finite comes 
from the fact that the output pk of the filter (its response) 
will become exactly zero after N time steps have passed 
since a single initial impulse in the data. For example, 
if we assume Wq = 1, Wk = for k ^ 0, then pk will 
vanish for k > N. As seen from Eq. (15), each pk costs 
N multiplications and N additions to calculate. This 
translates into a computational cost, in terms of FLOPs 
per unit time, of ~ N/ At. 

For certain types of waveforms, Infinite Impulse Re- 
sponse (IIR) filters can be used to dramatically reduce 
computational cost. The simplest IIR filter is a first- 
order recursive algorithm, in which the fc-th output i/k is 
a linear combination of the (k — l)-th output, yu-i and 
the A;-th data, Wk- 



!Jk 



= e -h-i^y k _ 1 + WkAt , 



(16) 



where 7, f2 are real- valued constants with 7 > to ensure 
stable solutions. It can be shown that, e.g., by using 
tools of Z-transform [12] , as long as Wk does not diverge 
towards k — > —00, then even if the recursion starts at a 
finite time step, after an initial transient of several times 
I/7, the output of the filter achieves a steady state of 



y k = Wje (l - in)(i - k)At At. 
j'=— 00 



(17) 



Note this is the discretized version of the continuous in- 
tegration 



y(t) 



;(f')e(T- m )(*'-*)e(t (18) 



Note first that Eq. (17) indeed gives an infinite impulse 



impulse, wq = 1 and Wk = (k ^ 0), the output of the 
filter, even at very late time steps, never vanishes. More 
over, by comparing this with Eq. (12), the IIR filter can 



be viewed as a template of a damped sinusoid: 



u(t) 



e(-t), 



where Q(t) is the Heaviside function 

0, t < 0; 

1, t > 0. 



e(i) 



(19) 



(20) 



The IIR filter described above requires only one complex 
multiplication and one summation per sampling time, 
which means the computational cost is ~ 1/ At. 

For a simple proof of concept on the computational ef- 
ficiency of the IIR over the FIR filtering technique, we 
examine the case when we do need to filter for a damped 
sinusoid signal with frequency Q, and decay rate 7. The 
data to be filtered has a duration of at least on the order 
of I/7. The Nyquist sampling theorem limits the sam- 
pling interval to be at most ~ 1/S7, meaning that the FIR 
template would have the number of data points several 
times larger than fl/"f. Subsequently the computational 
cost of the FIR in FLOPs per unit time is larger than 
51 2 /7. An IIR filter, on the other hand, only has a cost 
of O, which means the cost of IIR filter is 7/Sl ~ 1/Q 
times that of FIR filter, where Q = fi/7 is the quality 
factor of the damped sinusoid. As a consequence, if we 
can convert our waveforms into a sum of a series of of 
high-Q damped sinusoids, IIR filters can be used over 
the FIR to dramatically reduce the computational cost. 



III. CONSTRUCTION OF IIR FILTERS FOR AN 
INDIVIDUAL INSPIRAL WAVEFORM 

The simple IIR filter discussed in the previous sec- 
tion has the special waveform of a decaying sinusoid 
[Cf. Eq. (19)]. In this section, we will show that a chain 



response, because for a data series containing only one 



of IIR filters can be used to "piece together" the wave- 
forms of compact binary coalescence. This is possible be- 
cause these waveforms are basically sinusoids with slowly 
varying amplitude and frequency. For simplicity, in this 
paper, we will restrict ourselves to Newtonian Chirps. 



A. The Newtonian Chirp Waveform 

The Newtonian-chirp is the leading-order waveform 
from a compact coalescing binary. In the time domain it 
can be written as the real part of the complex expression 
(see, e.g., 0, Sec. C), 

u(t) oc (t c - r i/4 r .2(5M c )- 5 ' 8 ( ic -f' 8 + * = A(ty*M 

(21) 

where we follow the convention of the Planck unit that 
sets gravitational constant G = 1 and the speed of light, 



5 



c = 1, M c is the chirp mass of the binary, 

M c = Mrj 3/5 (22) 

which depends on the total mass of the binary M and r\ = 
the symmetric mass ratio. The signal finishes 
at the ending time t c , and <p c is the constant phase at the 
end time. Here we have ignored time-independent factors 
of proportionality in the amplitude, which do not affect 
template construction. 

We have assigned real- valued functions A(t) and 
to denote the amplitude and phase of the waveform. Al- 
though the actual waveform is the real part of u(t), we 
have intentionally kept its complex form, because the 
imaginary part of u(t) represents the waveform of a bi- 
nary with a phase shift of 7r/2 from the real part — there- 
fore the real and imaginary parts together form a basis 
for the linear space of signals of all phases. This is a 
feature of all adiabatic waveforms, which satisfy 



B. An IIR filter chain 



A/(siA) < 1, ti/n 2 < 1. 



(23) 



In other words, the amplitude A(t) and angular fre- 
quency <&(i) both evolve at rates much slower than the 
instantaneous frequency This allows us to use the 
Stationary Phase Approximation (SPA) to compute the 
Fourier Transform of the waveform in Eq. (21), 



«(/) oc r r/e^Af-^ + 2,ft c+4 > c -,/A) ^ / > , (24) 



where 



A= ^(8ttM c )- 5 / 3 



(25) 



is the intrinsic parameter we need to search for in the 
case of Newtonian chirp. Note that when we Fourier- 



transform the complex signal of Eq. (21), there is only 



positive- frequency component, with u{f) = for / < 
On the other hand, if we took the real part of the signal, 
we would have u(—f) — u*(f) for / > 0. 

The duration of a coalescence GW signal can be well 
approximated as a function of chirp mass M c (Eq. (22)) 
and the detector's minimum cut-off frequency / m i n , 



T(M c ,/ n 



647013 



(/mi„/Hz)8/3(M c /M )5/3 



s. 



(26) 



One can see that for a fixed / m i n , the longest signal dura- 
tion corresponds to the smallest chirp mass. The sample 
signal durations for the initial, advanced and future GW 
detectors of various / mm can be found in Table |IJ column 
3. It is shown that GWs from a canonical (1.4+1.4)Mq 
NS-NS binary system will have a duration 40 times longer 
for advanced detectors, and possibly 10000 time longer 
for the future ET detector than that of the initial detec- 
tor. 



The adiabatic condition in Eq. (23) also implies that 



the waveform can be divided into constant-frequency in- 
tervals: within each interval it can be approximated as 
a sinusoid with constant frequency, while neighboring in- 
tervals have slightly different frequencies. This further 
indicates that we can attempt to write the entire wave- 
form into the sum of a series of damped sinusoids: the 
frequency of each sinusoid corresponds to a constant- 
frequency interval, the ending time of the sinusoid cor- 
responds to the ending time of this constant-frequency 
interval, while the decay time should be comparable to 
the length of the constant-frequency interval. The ampli- 
tude of the decaying sinusoid can be set to be comparable 
to the amplitude of the original waveform during the cor- 
responding constant-frequency interval. 

Mathematically, our target is therefore to approximate 
the signal template u(t) with the sum of a chain of IIR 
filters [Cf. Eq. (Ill)], which we denote by U{t): 



M 



U{t) ee Bte^-^W-^Qiti - t). 



(27) 



1=1 



Here the chain consists of M filters; for filter I (1 < I < 
M), B[ is the amplitude of the filter I, fi; and 7/ are the 
angular frequency and decay rate, and ti is its ending 
time. 

As a first step, let us determine the relevant portion of 
the signal that we need to approximate: this is bounded 
by the low frequency cut-off f m - ln , below which the chirp 
only contributes negligible signal-to-noise ratio, as well as 
the high frequency cut-off / max - The minimum frequency 
/min is normally determined by the seismic wall of the de- 
tector which is set to be 40 Hz for initial LIGO, 10 Hz for 
Advanced LIGO, and might extend to lower frequencies 
in future detectors, such as the Einstein Telescope (ET) . 
The maximum frequency / max is either determined by 
the end of the Newtonian chirp or the upper end of the 
detection band. In this paper we set / max = 2000 Hz. 

Now suppose our Newtonian chirp has a particular 
value for the intrinsic parameter A, and t c — 0, <p c = 0. 
Let us define t$ = t- m i as the time at which the instanta- 
neous frequency of the waveform is equal to f min (which 
means \t \ = —to is the duration of the Newtonian chirp 
from /i n i to coalescence), and incrementally define 



ti — ti-i + Ti 



*(t,)2? 



e < 1 , I 



1,2,. 



until we reach tM, which corresponds to a frequency at 
or beyond / max . These intervals, 



(29) 



will be the constant-frequency intervals described previ- 
ously. The parameter e should be substantially less than 
unity, so that the phase error caused by assuming a con- 
stant frequency is significantly less than one radian. 



G 



For t £ [ti-i,ti], we expand $(t) at t* = t\ — aTi (where 
a is an ad hoc parameter to be adjusted later) 

*(t) * + - tf) + \$(fi)(t - U) 2 (30) 

such that the first term is a constant phase, the sec- 
ond term gives a single angular frequency of <&(£;*), while 
the third term gives the error of a single-frequency ap- 
proximation, which will be small if e is small enough in 
Eq. ( |28j ). We will then use SI/ = -<&(**) as the oscilla- 



tion frequency of the IIR filter assigned for this constant- 
frequency interval, and prescribe a complex amplitude of 



Bi = A(fi)e 



i*(tn-ifii(t(-«D 



(31) 



These will assemble into 

Bie -tn l (t-t l ) _ * ) e »* (**)+**(**){*—**) 

« -4(i)e**«, t,_! <t<t,. (32) 

We must still add a Heaviside function and a damping 
component to modify ( [32] ) into a form realizable by an 
IIR filter. Since the validity of (32) is between ti-\ and 
ti, it is natural to have the Heaviside function cut off 
values for t > ij, and to have the damping component 
have a time constant comparable to TJ, which gradually 
cuts off the filter at i < Prescribing 

7/ - C/Tt , (33) 
with £ yet another ad hoc parameter, we write 

Ui(t',A,t e = 0,<f> c = 0) 
= B J e- <n '< 1 -* l >-'»(* | - t >e(t I -t) (34) 

which is our IIR filter for interval Z, for chirps with pa- 
rameters A, t c — 0, = 0. Summing over all Ui, we 
obtain an IIR chain that approximates the entire com- 
plex chirp signal: 



U(t;A,t e = 0,4> B = 0) 



M 

E 

i=i 



Ui{t;A,t e = 0,<l> c = 0). 



If the sum of the complex filter chain U(t) indeed approx- 
imates the complex chirp signal u(t) [Cf. Eq. (21)], then 
the real and imaginary parts of the output from the fil- 
ter chain will be good approximations for filtering chirps 
with <p c = and 7r/2, respectively. 

For non-zero t c , we will have to apply 



U{t\A,t c ,4> c = Q) = U(t- 



O,0 C = O) (36) 



Note that having Heaviside Function 0(t — t c — 1{) within 
U[ means we have to collect the IIR filter result of filter I 
at ti+t c . The fact that all ti are negative means all results 
are obtained before the coalescence (which happens at t c ) 
and hence IIR filtering itself causes no latency — except 
for the small latency due to over- whitening, as stated 
previously (sec. II B) 



Filtering for general signal phases and goodness 
of match 



Since the construction of the IIR filter chain is of an ad 
hoc nature, we must test how well the resulting IIR filter 
chain U can approximate the original signal u. A natural 
candidate would be imposing that the match between the 
signal u and the template U 



Pcplx — 



\{u\U)\ 



y/{u\u){U\V) 



(37) 



must be close to unity. 

However, this needs to be connected to the signal-to- 
noise ratio achievable by IIR filtering. For doing so, we 
must first elaborate how to use the output of the complex 
IIR filtering to recover signals with arbitrary phases. If 
we write 

u = u r + iui (38) 

with u r> i represent the real and imaginary parts of u in 
the time domain, and similarly, 



U=U r + lUi 



(39) 



then the true signal of arbitrary phase is a linear combi- 
nation of u r and written as Aiu r +A2Ui, and we should 
use a linear combination of the real and imaginary parts 
of U, namely B\XJ r + B^lJi as the search template. For 
any particular coefficients A\^,, the optimal overlap is 
given by 

(A lUr + A 2 u l \B 1 U r + B 2 U l ) 

Pnu(Ai, A 2 — max — 

Bi.2 ^J{B 1 U r + B 2 U l \B 1 U r + B 2 U l ) 

(40) 

The worst-case scenario is given by a minimization over 
(A 1 ,A 2 ): 



Pn°R St = nun 



pim(Ai,A 2 



A ^ ^(Axur + A 2Ul \A lUr + A 2 Ui) 



(41) 



In fact, when the signal and the template are both highly 
adiabatic, it can be shown that puk(Ai,A 2 ) is approx- 
imately independent of Ai^ 2l and that to a very good 
accuracy: 



„ „worst 
Pcplx ~ FUR ■ 



(42) 



Eq. (41 ) is therefore used to calculate the goodness of the 



match of the IIR filter chain. 



D. Implementation for (1.4 + 1.4)M S binaries and 
initial LIGO 



We first apply the prescription described in Sec. IIIB 
to construct an IIR filter chain for (1.4+ 1.4)M Q binaries 
for initial LIGO and use Eq. (41) to test their overlap 
with the true signals. We choose (by hand) a = 2.3, 
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Rate 

Type 


Sk (s ) 


16 


32 


64 


128 


256 


512 


1024 


2048 


4096 


8192 


Mot 


Total 

VyObL 


f /H? 
J / nz 


2—4 


A » 

4 — 


8 — 16 


16 — 32 


32 — 64 


64 — 128 


izo — zoo 


zoo — Olz 


c.19 1 (194 




iLIGO 


FIR 


AVlR.k 










4547 


3062 


965 


304 


96 


30 


9004 


20 


LFIR.,fc 










A *7 
4. i 


D.O 


a n 
4.U 


2.0 


1.0 


l.U 


IIR 


AilR.k 










71 


62 


34 


19 


10 


4 


200 


2.4 


Cur 










n 99 


u.oo 


n 49 




n 40. 


u.oy 


aLIGO 


FIR 


AT 

L VpiR. : k 






45835 


30868 


9 1 16 


3062 


900 


9 A /I 

3U4 


96 


on 
30 


90883 


53 


r> 

^PIH,,fe 






11.7 


15.8 


10.0 


6.3 


4.0 


2.5 


1.6 


1.0 


IIR 


AlIR.k 






220 


198 


111 


62 


34 


19 


10 


4 


658 


3.0 








0.17 


0.30 


0.34 


0.38 


0.42 


0.47 


0.49 


0.39 


ET B 


FIR 


A^FIR : k 


213010 


311130 


98000 


30868 


9723 


3062 


965 


304 


96 


40 


667198 


120 


CpiR^fc 


13.6 


39.8 


25.1 


15.8 


10.0 


6.3 


4.0 


2.5 


1.6 


1.0 


IIR 


iVlIR.k 


392 


631 


353 


198 


111 


62 


34 


19 


11 


3 


1814 


3.3 


ClIR 


0.08 


0.24 


0.27 


0.30 


0.34 


0.38 


0.42 


0.47 


0.54 


0.29 



TABLE II: Break-down of number of niters and computational cost (over successive two-fold down-sampling channels) of 
multi-rate FIR and IIR filtering, of a single template for a (1.4 + 1.4) Mq binary for initial, Advanced LIGO and the Einstein 
Telescope. See text in Sec. Ill D| Here computational costs for each type of filtering and for different sampling channels are 
calculated using Eqs. ( 44 1— ( 4T I , with numerical values quoted in units of MFLOPS or 10 6 FLOPS. The minimum overlap is 
0.99. 



e = 0.269 and C = 4, an overlap of 0.99 is achieved with 
A/iir = 200 IIR filters. 

We next estimate the computational cost required by 
such IIR filtering. We focus on the floating point op- 
eration count per unit time required to generate com- 
plex outputs from the sum of individual IIR filter out- 
puts of Eq. (16). Here we assume the maximum sam- 



ple rate for compact-binary coalescence data analysis is 
8192Hz, with 2x down-sampling applied successively to 
provide channels with sample rates of 4096 Hz, 2048 Hz, 
. . . , 256 Hz. The IIR filter bank is divided into 6 groups, 
each corresponding to a frequency band of 2 fc+5 -2 fc+6 Hz, 
for k = 0, 1, 5. For filters in group fc, we assume they 
are applied to the channel with sample rate of 



Sk — 2 



fc+8 



Hz. 



(43) 



In Table |TTJ we list the actual number of IIR filters re- 
quired to achieve a minimum overlap of 0.99 at different 
frequency band with downsampling technique. For com- 
parison, we list the corresponding numbers for the FIR 
method also applied with downsampling technique. 

At each time step, each IIR filter needs to per- 
form a total of 12 real-number multiplications and addi- 
tions namely: 4 real-number multiplications plus 2 real- 
number additions for multiplying the current output by 
the complex recursive coefficient, 2 real-number multi- 



plications for multiplying data (second term in Eq. (16)) 
with a complex normalization coefficient to yield proper 
SNR output, 2 real- number additions for combining the 
previous two products, while finally 2 real-number ad- 
ditions for adding the result of this filter into the total 
output. 

If we ignore costs for down- and up-sampling, which 
are performed relatively rarely, the total computational 



cost for initial-LIGO filters in Table HTl is 

5 

C nR = 12S kNim,k * 2.4 x 10 6 flops. (44) 



fc=0 



On the other hand, if we carry out the same down 
sampling scheme for FIR filtering, the number of points 
in group will be 



FIR,0 



= S ■ [£(64 Hz) -h 



(45) 



where t(64Hz) is the time at which the instantaneous 
frequency is 64 Hz. For k = 1, 2, 3, ... 5, we have 

AW = S k ■ [t(2 k+6 Hz) - t(2 k+5 Hz)] (46) 

At sample rate Sk, for each time step, we have to perform 
two real- valued correlations with array length A^piRk, 
which cost 4A 7 FiR,k floating point operations. The total 
computational cost of FIR filtering is therefore 



Cfir = 4 ^FiR,k - 2.0 x 10 7 flops. (47) 

fc=0 

This is nearly 8 times the cost of the IIR filter method as- 
suming downsampling technique applied to both filtering 
methods. The result of above cost estimation for the IIR 
and FIR filtering are also listed in Table \H\ We will show 
in the next subsections that the improvement is much 
more significant for advanced detectors as they venture 
into lower frequencies. 



E. Dependence on initial frequency and future 
detectors 



As initial frequency / m m is lowered in future 
gravitational-wave detectors, we anticipate much longer 
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FIG. 1: Theoretical (dashed curves) and numerical (labeled 
by "+" 's) scaling of the computational cost with / m i n for the 
FIR (red color) and IIR (blue color) method for one tem- 
plate, fixing / max = 2000 Hz. The theoretical scaling is based 
on Eqs. (491 and (51 1 (see Sec. HIE 2 1, numerical values are 
taken from Table H[ column 15. 

signals (see Table [T]) , and therefore a possibly dramatic 
increase of computational cost. In this subsection, we 
will first obtain analytical scalings in IIR and FIR com- 
putational costs, assuming an idealized down-sampling 
scheme. We will then provide more realistic estimates of 
cost by constructing actual IIR filters and adopting the 
same successive 2x down-sampling strategy. 



1. Analytical Estimates 

Ideally, the minimum sample rate is twice the instanta- 
neous frequency of the signal, or S — 2/. For FIR filters, 
we have 



N Fm > 2N C - 



Qdt 



dO~rf 3 .(48) 



Converting the summation Eq. ( 47 ) into integral, we ob- 
tain: 



C 



FIR 



/ 



FIR 



/x 



-2/3 



(49) 



For IIR filters, during a dephasing time of T = 

we use one filter AiVnR = 1 and Af2 = (IT = 
which leads to 




iVi 



IIR 



dNuR 

dn 




-5/6 



The computational cost of IIR filtering is 

Cur ~ / fieWiiR ~ /mix - fmil- 



(50) 



(51) 



Note that for IIR filtering, the positive power law means 
the computational cost scales predominantly with the 
higher cut-off frequency, instead of the lower cut-off fre- 
quency — we therefore expect the computational cost 
not to increase dramatically when / m i n is lowered, if we 
already have / max > / m in- 



2. Numerical Estimates 

More detailed constructions for Advanced LIGO and 
Einstein Telescope (ET) have been carried out, follow- 
ing Sec. |IIID| assuming / min = 10 Hz for Advanced 
LIGO and 3 Hz for ET. Assuming the same successive 
2x down-sampling strategy, we evaluate the single tem- 
plate computational cost for (1.4 + 1.4)M© binaries for 
both FIR and IIR filtering. As it turns out, using the 
same e = 0.269, but {a,Q = (2.5,4.25) for Advanced 
LIGO and (a,C) = (2.25,4.5) for ET, will still give us 
match above 0.99. 

The number of filters in each down-sampling band, as 
well as computational cost break-down for a single tem- 
plate are shown in the second and third tiers of Table |Hj 
for Advanced LIGO and ET, respectively. We also com- 
pare our numerical values with scaling laws predicted in 
which are plotted in dashed curves 
[We determined the normalization of the theo- 



Eqs. {49 1 and (|51 
in FigTJl 

retical formulas using numerical values of computational 
cost at / m i n = 40 Hz.] The agreement is remarkable, 
especially considering that our successive 2-fold down- 
sampling is not continuous, and therefore rather non- 
ideal. 

As we can see from Table [H] and Fig. [T] the IIR re- 
duces computational cost from (multi-rate) FIR filtering 
by factor of 8 for initial LIGO. As we move to lower start- 
ing frequencies, the saving factor increases to 18 and 40, 
respectively. The single-template cost, even when we ex- 
trapolate / m in to the rather unlikely 1 Hz, stays at several 
MFLOPS. 



IV. INTERPOLATION BETWEEN IIR FILTERS 
OF DIFFERENT INSPIRAL WAVEFORMS 

In order to search for all possible kinds of compact bi- 
nary coalescence, we must match the signal with a family 
of templates parametrized continuously by the parame- 
ters of the binary, e.g., their masses. In practice, al- 
though maximization of match over certain parameters 
(e.g., orbital phase of the binary) can be done analyt- 
ically, for the rest of the parameters, we must sample 
them discretely, and build a template bank — and match 
the signal with each member of the bank. The den- 
sity of the discretization is usually determined by im- 
posing that each member of the continuous family can 
be approximated well enough by at least one member of 
the bank, with mismatch less than a maximum tolerable 
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For advanced detectors, the number of templates can 
be as large as 10 5 [13] posing a significant computa- 
tional challenges. Interpolation strategies have there- 
fore been conceived (e.g., |14H17) 1 to reduce the num- 
ber of templates, based on the fact that signal-to-noise 
ratio is a continuous function of the parameters being 
searched over. More specifically, if we refer to the bank 
constructed by imposing the maximum tolerance of mis- 
match e max as the fine bank, then the hope is that even 
if match is calculated for a coarse bank in which parame- 
ters are less densely populated, the signal-to-noisc ratio 
of the fine bank can still be recovered by interpolation, in 
such a way that the total cost of computing coarse-bank 
SNRs plus interpolating fine-bank SNRs is less than the 
cost of directly computing fine-bank SNRs. 

Our interpolation method differs from previous work 
in that we divide each coarse-bank template into sev- 
eral sub-templates in frequency (thus time) domain, 
and recover fine-bank SNRs using SNRs from the sub- 
templates. This approach has been inspired by the SVD 
approach [10l E] adopted by the LLOID [7j and the 2- 
bank interpolation in MBTA [H| methods. We will show 
that, although the division into sub-templates increases 
the cost of recombination, it allows a much coarser bank 
— and finally decreases the computational cost by a large 
factor. 



A. Template banks in general 

To develop a scheme to discretize the parameter space 
without losing detection efficiency, we must know how 
much the SNR is reduced by using a template whose pa- 
rameter values differ from those of the signal. We define 
the mismatch between two normalized templates of dif- 
ferent sets of parameters as 



e = 1 - <u(A)|u(A')>. 



(52) 



The template u is specified by a parameter vector A. If 
A' is near to A, we can Taylor expand e at A and have 
the approximation to second order of AA = A' — A as 



1 d 2 e 



2 dXidX, 



AAjAA, 



(53) 



AA=0 



from which we define a (positive definite) metric in the 
parameter space 



1 d 2 e 



2 dXidXj 



(54) 



AA=0 



Equations ( 53 1 and ( 54 1 indicates that mismatch between 



neighboring points in the parameter space can be viewed 
as distance measured by metric 7. 

Suppose we would like to place a template bank in 
a /^-dimensional parameter space, with a mismatch no 
higher than e, then the most straightforward strategy 
would be laying down a cubic grid with proper side length 



dl measured by the metric 7^ , such that template placed 
at each grid point will be able to cover a cube whose 
vertices are centers of neighboring cubes. This means we 
have 



D(dl/2) 2 



(55) 



The volume spanned by each cube (according to metric 
jij) is therefore 



AV = dl D = {2^/D) D . 



(56) 



The total number of templates in the bank would be the 
total volume of the parameter space divided by the vol- 
ume of each cell, or 



TV = 



V tot _ fd D \^/det\\ 7lJ \\ 



AV 



(57) 



B. Newtonian Chirps 

Through the Stationary-Phase Approximation |18) . 
the Fourier Transform of a Newtonian Chirp can be writ- 
ten as 

u(f;A,t c ,cb c ) r'/^ A f- 5/3 ^^\ />0, (58) 

and u(f) = u*(—f) for / < 0. The mismatch between 
two neighboring templates with parameters (A, t c , (f>o) 
and (A + AA, t c + At c , 4> c + Acj>o) can be written as 



s(AA,At c ,A<j> ) = 1 



f- 7 / 3 cos A$ 



df 



-7/3 



(59) 



Sh(f) 



df 



where 



A$ = f- 5/3 AA + 2irfAt c + A<j) c 



(60) 



Expanding Eq. (59 1 up to second order in A$, we obtain 



by comparing with Eqs. (53) and (54) the metric 

hn\\ 



J(-f ) J(-3) J(-4) 



* '(-!) 



(61) 



where "*" indicates terms obtainable by symmetry, and 



f— jp_ 



An« f-7/3 



(62) 



and we have used i = 1, 2, 3 to label AA, 2nAt c and A(f> c , 
respectively. Note the metric depends on the frequency 
division and noise spectral density only. 

Here among the three parameters, search over (j) c is 
done analytically, as discussed in Sec. |III C| while search 
over t c is carried out systematically at the sample rate 
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— the only parameter left to discretize is A. Therefore, 
A is an intrinsic parameter as described previously. The 
correct way to place templates along intrinsic parameter 
directions is to "project out" the intrinsic parameters, as 
discussed, e.g., by Owen and Sathyaprakash [T3"] . 

In our case, the projected metric along direction A is 
one dimensional given by 



511 = 7n 



7? 3 722 



2712713723 



7^2 733 



722733 



7l3 



(63) 



which depends on / m i n , / max an d the noise curve Sh 
through I(/3). Following Eq. (57), the number of tem- 



plates required to achieve a mismatch e is then 



TV = 



-^min) 



2^e 



(64) 



where A m - m and A max are the minimum and maximum 
values of A. Here we can be more specific about template 
placement along the A direction. Given any A, which 
is associated with a member of the template bank, and 
suppose its mismatch with a neighboring template with 
A ± AA is e max , or 



g u (AA) 2 = £ n 



(65) 



then neighboring templates should be placed at A±2AA, 
therefore we have 



A, 



2AA 



(66) 



which recovers Eq. ( 64 ) 



Here we give the noise spectral density we use for initial 
LIGO, Advanced LIGO, and Einstein Telescope (ET B ). 
For the initial LIGO [19], we have x = //(150 Hz) and 

S h (f) = 9-10~ 46 [(4.49a;)- 56 + 0.16a;- 4 - 52 + 0.52 + 0.32x 2 ] 

(67) 

For Advanced LIGO [20], we have x = //(215Hz) and 



S h (f) = 10 



-49 



5x 



111 



2 X 



1 T 2 
2 



Note this is different from what is used in [7]. As a re- 
sult, two methods are dealing with different number of 
templates for the same parameter space. This should be 
taken into account when we compare the computational 
cost of the two methods. For the Einstein Telescope [3T], 
we have x = //(100 Hz) and 



2.39 x 10 



1.76a;- 



-27^,-15.64 



0.409a: 1 



0.349aT 2145 
(69) 



Applying Eqs. (64) and (65) to these three detectors, 



we can show that the number of templates increase by a 
factor of 3.9 when we upgrade from initial to Advanced 
LIGO, and another factor of 4.4 when we upgrade from 
Advanced LIGO to the Einstein Telescope. These num- 
bers are listed in Table [T] column 5. 



C. Subtemplates 

1. General Discussion 

Now suppose we divide our entire signal frequency in- 
terval, (/min, /max) into M segments of 



[/oi/i]j 1/11/2] 



[Jm-i, Jm] 



(70) 



with /o = /min and /„ = / max . (When we later apply this 
to IIR filter chains, M will be much less than the total 
number of filters, N.) For any template u, we define 
sub-template uj, J — 1, . . . M, to have the same value as 
template u within the frequency interval [/j-1,/,7] but 
have zero values elsewhere, 



«./(/) 



u(f), fj-i <f< fj, 
0, otherwise. 



(71) 



Now let us consider two neighboring templates, u and v, 
their J th - sub-innerproduct can be naturally defined as an 
integral over frequency segment J: 



(u\v}j = (uj\v,j) = 4Rc 



fj 



df 



u*(f)v(f) 
S h (f) 



(72) 



This sub-innerproduct can also be regarded as the con- 
tribution to the full inner product (u\v) from segment J 
[Cf. Eq. and 



M 

j=i 



u\v)j 



(73) 



We denote u and u + Au as neighboring templates, and 
we also define their J th -sub-mismatch specific to interval 
J, in the intrinsic parameter space, as 



1 



Au) 



(74) 



\J (u\u)j(u + Au\u + Au) j ' 
which is equal to the "ordinary" mismatch between uj 



(68) and uj + Auj as defined in Eq. (52). Up to second 



order in Au, we can show that the total mismatch and 
the J th -sub-mismatch are 



1 (Au|Au) 

2 (u\u) 

1 (Aw I Aw) ,7 

2 (u\u)j 



Using Eq. ( 73 ) , we can show that 

(u\u)j 



M 

E 

7=1 



Since 



M 1 1 \ 

,7=1 



(UU) 



= 1, 



(75) 
(76) 

(77) 

(78) 
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the overall mismatch is therefore a weighted average of 
the sub-mismatches. This means to achieve an over- 
all mismatch of e, we only need to make sure the sub- 
mismatches £j average to s. This has dramatic implica- 
tions in the sense that it allows the overall mismatch 
to be maintained by (1) dividing the frequency band 
into several frequency intervals with non-uniform sub- 
mismatches, (2) reducing the size of frequency intervals 
to allow larger step size for intrinsic parameters. These 
lay the foundation for our template interpolation method. 




FIG. 2: Illustration of the phase function $(/) vs frequency 
for the presumed parameter A (blue solid line) and its neigh- 
boring parameter A + AA (red line), the linear shift of the 
blue line to match the red line (green dashed line), and a 
piecewise approximation (black dashed line) of the red line 
by shifting segments from the blue line. It shows that with 
smaller frequency intervals, it is easier to match phases arising 
from different intrinsic parameters. 



To qualitatively understand the reason that the grid 
size for intrinsic parameter placement can be enlarged 
when we restrict ourselves to smaller frequency intervals, 
we first note that in the frequency domain, it is the phase 
that we need to match, while the amplitude as a func- 
tion of frequency is the same for all parameters. We note 
that the phase of u(f), which we denote by $(/), is de- 
termined by A, as well as t c and <f> c (Eq. (58)). In Fig. [2] 
we plot the phase $(/) for a particular set of parame- 
ters (A, t c , (f> c ) in blue and also for a neighboring set of 
parameters (A + AA, t c , 4> c ) in red. If we were to use the 
template with parameter A to search for a signal with 
parameter A + AA, we could shift </> c and t c used in the 
search, which corresponds to shifting the blue curve by a 
linear function in frequency. The green dashed line illus- 
trates a reasonably optimal attempt — yet the difference 
between the green curve and the red curve cannot be 
reconciled very well due to the fact that linear functions 
do not correct for curvature. However, if we divide the 
frequency range into several intervals, and allow differ- 
ent values of At c and A</> c to be applied to each interval, 
then sub-templates with A can achieve rather low sub- 
mismatches with signal with A + AA. This corresponds 
to the fact that a curve can be better approximated by 



straight lines when divided into smaller intervals. 



2. Newtonian Chirp in the Frequency Domain 

Let us now focus on a particular frequency segment J, 
with fj—i < / < fj, and work out the relation between 
A A and ej, as A<fi c and At c are allowed to readjust their 
values (to be different from other segments). This simply 
requires us to repeat the procedure in Sec. |IVB] for each 
segment: with A A, At c and A(j> c , we have the J th -sub- 
mismatch of 



sj = [ AA 2nAt c A<p c } Y 



AA 

2wAt c 

A(j> c 



with 



r 



hi-*) Ji(-4) 



and 



1 


" f fj dffP ' 




- rfj df f-7/3 


2 


Jfj-i s h (f) 


1 


Jfj-. s h (f) 



(79) 



(80) 



(81) 



Note that the above are identical to Eqs. ( 59 )— (61 1, 
except with integrations restricted to the interval of 
[fj-iifj]- 

The next step is similar to the "projection" process 
described by Owen and Sathyaprakash, but restricted to 
interval J. With Eq. ( 79 1 , we ask the following question: 



if we are allowed to freely re-adjust individually the val- 
ues of At c and A<j> c for interval J of the template (i.e., the 
J th -subtemplate), what would be the J th -sub- mismatch 
achievable for AA, and what should the corresponding 
Acj> c and At c be. 

The answer to the question is readily obtainable by a 
maximization of the mismatch e over At c and A0 C , fixing 
AA. This results in adjustments of 



" 2ttA^ " 




' 122 


7 2 J 3 " 


-1 


' 7f 2 " 






. 7 3 J 2 


73 J 3_ 




. 7i3 . 



AA 



which result in the J th sub-mismatch of 
ej = g{ 1 {AAf, 

with 

./ _ j h(z] 2 l22 - 27i 7 2 7i 7 3 7 2 J 3 + [7i J 2 ] 2 73 7 3 
9\x = In 



7 2 J 27 3 J 3 - [7/3] 2 



Following Eq. ( 77 1 , we have the total mismatch 
e = gf 1 (AA) 2 , 

where 

9u(u\u)j 

J 



9x 



(82) 

(83) 
(84) 
(85) 
(86) 
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is an effective metric coefficient for any division of the 
frequency band. More specifically, gff describes the mis- 
match achievable by individually adjusting and At£, 
for each interval of the division. It can be shown that in 
general a finer division of frequency intervals always gives 
As a consequence, if we define 



a smaller gf±. 



AA cb = 



eff 

#11 



(87) 



with the subscript "cb" indicating coarse bank, then 
AA c b will be greater than AA given by Eq. ( [65] ) where 
gu is evaluated using the full frequency band. In order to 
make a distinction, we shall rewrite that same equation 
as 



AAfb = 



5ii 



In 



but adding a subscript "fb" to indicate the fine bank 
order to maximize AA c b for a maximum mismatch £ max , 
we should simply minimize gff globally, over all possible 
frequency division schemes. Because a template at A in 
the fine bank covers (A — AAfb, A + A Aft,), the ratio of 
the number of templates in the coarse bank to that of 
the fine bank is, 



(89) 



In summary, given a required maximum mismatch 
£ max with a particular frequency subdivision, by adjust- 
ing A0£ and At J c individually, a single template at A 
can cover the region (A — AA C \ J1 A + AA c b). For a signal 
with \AA\ < AA c b, the J*^-sub-template for parameter 
A + AA can be constructed by adjusting At J c and Acf)^ of 
the sub-template of template A using Eq. (82 1. The in- 




terpolated template of parameter A + AA is therefore the 
sum of the constructed sub-templates from a coarse-bank 
template A 

u(f;A + AA,i c ,0 c ) 

M 

= %(/; A *c + At J c (AA), <f> c + A^ 7 (AA)) 

,7=1 
M 

= Y / ^(f;AtcAc)e^ fAt ' ( - AA)+lA4 '' {AA) . (90) 

,7=1 

It is straightforward to establish the following proper- 
tics of the effective metric: (i) gff always becomes smaller 
when we insert one or more dividing frequencies into an 
existing division of [/ m in, /max], (ii) if we continue to de- 
crease the maximum size of intervals, we can decrease 
<7if indefinitely [in fact, for small intervals, g( x scales as 
(A/) 5 , which means <?ff should scale as (A/) 4 , and hence 
AA scales as (A/) -2 ]. Furthermore, for template fami- 
lies with more than one parameter, it is straightforward 
to generalize our result to 



off = 9abW u )J 
Jab /„.|,A 



(91) 



with the number of templates in the coarse bank given 
by 



det||flr &|| 



(92) 



D. Application to IIR filtering technique 

In this section, we will apply the formalism developed 
in the previous subsection and discuss how we can im- 
plement IIR filter chains only for a much coarser bank 
of templates — while still obtaining SNRs for the entire 
fine template bank. Discussions made in the previous 
sections, although strictly speaking only apply to sharp 
divisions in the signal frequency band, still qualitatively 
apply to IIR filters that work in time-domain. The trick 
is to replace frequency intervals in the previous section 
by groups of IIR filters. This approach will work as long 
as we include enough number of filters in each "group" , 
so that overlaps between different groups are relatively 
unimportant. We note that, as is the case for the con- 
struction of IIR filter chains, the construction of the in- 
terpolation scheme by itself does not justify its efficiency 
— a separate test of achievable match will be carried out 
explicitly after the interpolation scheme is constructed. 

To be more specific, we re-group the entire chain of 
N IIR filters into m sub-groups, with group J includ- 
ing those whose oscillation frequency lies within the fre- 
quency interval J defined in Sec. |IV C| In other words, 
group J of IIR filters can be written as 

Vj{t;A,t c ,4> c ) 

J2 Ui{t;A,t c ,<j> c ) 

|e[/j-i,/;] 
l j 

= ^M,*c0c), J=1,...,M, (93) 

l=h-i+i 

where we have Iq = 0. We will treat Vj as corresponding 
to the uj(f; A,t c ,<p c ) of Sec. IV C As a consequence, 
from Eqs. (82 1 and ( |90| ), signal u(t;A + AA,t c ,<p c ) 
can be interpolated by the IIR filters constructed for 
u(t;A,t c ,4> c ) 

u{t;A + AA,t c ,(i) c ) 



^2Vj(t;A,t c + Ati,d> c + A^) 

,7=1 
M 

Y,e iMi Vj{t;A,t c + AtlAc) ■ 



(94) 



,7=1 



Here At J c and A0^ should be computed from AA using 
Eq. ||82]). 



In practice we can easily generalize the coefficients in 
front of Vj to further reduce the overall mismatch, by 
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U!(A + pAA ch ) 




3 




U h (A + pAA ch ) 







multiply by d^qAA^) 
shift by A^(<jAAfb) 



data 



U h _ 1+1 (A + pAA c 



U h (A + pAA ch ) 



Vj 



multiply by dj{qAA ih ) 
shift by At J c (gAAfb) 



Ui M _ 1+1 (A + P AA ch ) 



U lM {A + pAA. 



cb) 



V, 



M 



multiply by d M {qAAfb) 
shift by Aff^A^fb) 



filter result for 
A + pAA cb + qAAfb 



IIR filtering 
(coarse bank only) 
p=\,2,-,N c h 



reconstruction 
(all fine bank members) 
9 =±l,...,±7V fb /(2AT cb ) 



FIG. 3: Schematic diagram of the IIR filtering process for a template with parameter A + pAA c ^ + qAA^. The first part is the 
IIR filtering for a member of the coarse bank, A+pAA c b, which produces a range of filter outputs, labeled by Ui ... Ui M . These 
are grouped into M groups of summed IIR results Vi, . . . , Vm- The result for A + pA c f, + qAAft is obtained by combining these 
Vj's after each one is multiplied by dj(qAAib) and shifted by Ati(qAAfb). The entire data analysis process still computes 
iVfb filter results, by including iV c b possible p's and iVfb/iV c b possible g's for each p. [In the special case of q = 0, the Vj's are 
directly summed without having to go through multiplications and time shifts.] The downsampling or upsampling process is 
not shown. 



using a slightly more general reconstruction formula: 

m 

u(t; A + AA, t B , 4> c ) ~ d J v At; A , to + Ati, C ), (95) 

.7=1 

where dj are complex coefficients that depend on AA, 
given by 

dj = t Jk(Vk(A, t c , 4> c )\u{A + AA, t c , C )> (96) 

K 

with the matrix T given by 

Tjk = (Vj(A, t c , <f> c )\Vic(A,t c , (j> c )) (97) 

E. Full computational cost 

Fig. [3] illustrates the procedure of obtaining the out- 
puts from IIR filter chain for fine-bank coverage by inter- 
polating coarse-bank filter outputs described previously. 



Upon obtaining outputs from subgroups of IIR filters for 
the coarse bank, we need to reconstruct outputs for all 
members of the fine bank. We hereby estimate the cost 
for reconstruction. Let's assume that a member of the 
fine bank that is not a member for the coarse bank is AA 
away from a coarse-bank template A. For this A A, we 
need to go through each group J of filters, take the total 
output of this group (which corresponds to filtering by 
Vj), multiply it by the complex number dj (6 floating- 
point operations) and shift in time by At J c: and then add 
it to the sum (2 floating-point operations). The output 
eventually yields the SNR corresponding to the member 
of the fine bank. Note that both dj and At£ are func- 
tions of AA, but they do not need to be recalculated for 
each time step. 

Assuming our frequency division is made in a way such 
that each filter group has the same sample rate (Sj for 
group J), then the total recombination cost is 

Crccom = SSj . (98) 
J 
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FIG. 4: Matches achievable with a Newtonian-Chirp signal at A + AA, by various templates built for A, using initial LIGO 
noise spectral density for a (1.4+1.4)Mq NS-NS binary. Black solid curves corresponds to the result for a Newtonian-Chirp 
template, therefore the match is equal to unity at AA = 0. Red solid curve corresponds to that using IIR filters, while red 
dashed curve corresponds to the interpolated match that can be recovered by using 6 filter subgroups. 
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TABLE III: Break-down of recombination cost required for obtaining one fine-bank template using the interpolation method, 
for initial, Advanced LIGO and the Einstein Telescope — assuming a successive two-fold down-sampling and ignoring the cost 
of down- and up-sampling. The IIR filter information is listed in Table [III For each down-sampling channel, we list the number 
of filter groups, as well as each of their upper-bound frequency (i.e., fj for group J), and the computational cost as computed 



by Eq. (99 1. Computational cost here is measured by MFLOPS, or 10" FLOPS. 



In language of Sec. HID if we assume there are N gTOUPi k 
IIR filter groups for each down-sampling channel, then 
the recombination cost can also be written as 



C 



fc 



(99) 



As a consequence, assuming that AA c b = RAAfo, we 
have a total cost of 



C 



total 



r V R 

12N IIRik , 



R 



Creco 
SN group, k 



s k , (100) 



with the approximation valid when R^$> 1. In this case, 
we can have a good estimate of the computational cost 
of IIR filtering with interpolation. For a coarse bank 
with density 1/R the fine bank, filtering cost naturally 
decreases to 1 /R of the cost of conventional IIR filtering 
without interpolation. The cost of recombination can 
be estimated with a simple rule: for each sample rate, 
the cost of recombination is about 2/(3fife) times that of 



conventional IIR filtering, where 



rife 



IR.fe 



roup, A; 



(101) 



is the average number of IIR filters in groups at the fc-th 
sample rate. As a consequence, the total cost of the IIR 
filtering with interpolation scheme including recombina- 
tion can be lowered significantly if we achieve a balance 
of R 1 and n 1. Note larger R means larger coarse- 
bank grid size AA c t, for a fixed AAjj. This is achieved 
by introducing finer frequency intervals. On the other 
hand, finer frequency intervals means more IIR groups 
7V group or smaller n within each down-sampling channel. 

The computational cost for performing down- or up- 
sampling is implementation-dependent (see discussions in 
[7]). They are not included in our calculation for simplic- 
ity. We only need to perform data downsampling once 
for all templates, so the cost should be negligible com- 
pared to the total cost. The upsampling process is needed 
at least for each coarse-bank template, but only for fil- 
ter group outputs. Note the number of filter groups is 
much smaller than the total number of the IIR filters. 
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Depending on the type of upsampling filters, the upsam- 
pling cost can be negligible compared to the total cost, 
but can also be in similar orders as the recombination 
cost. This requires further investigation. 



F. Implementation for initial, Advanced LIGO and 
Einstein Telescope 

We first investigate the case of initial LIGO to demon- 
strate the feasibility of our interpolation method. Tak- 
ing into account the fact that even the optimal match 
between IIR filter and the real signal is not unity, we 
need to place the fine-bank IIR template a little denser 
than that from theoretical waveform. Theoretically for 
the Newtonian waveform, we have AAfb = 923 (in units 
of s~ 5 / 3 ) to have a minimum match of 0.97 for templates 
based on the signal waveform. For the IIR filter bank, we 
need a smaller spacing of AA™ = 800 s- 5 / 3 in order for 
the bank to achieve the same match between an IIR tem- 
plate and the signal at A + AA™. Fig. [4] shows numer- 
ically calculated match as a function of template spacing 
AA for templates from the signal waveform (black solid 
line) and for the IIR filters (red solid line) for the case 
of (1.4+1.4) Mq binary. Note that the numbers of 
fine-bank templates here are slightly different from those 
given in Table |TJ as we use slightly different overlap and 
also we use numerically evaluated matches here, instead 
of ones computed analytically assuming high match (in 
SecliYBl). 



IIR filtering is C^ k = 221 MFLOPS. Since the number 
of templates in the fine bank is 



To test the coarse-bank template placement, for sim- 
plicity, we restrict ourselves with the case of subdividing 
the frequency band into a total of six segments (or equiv- 
alcntly, six IIR filter groups in the time domain) . Accord- 
ing to the idealized theoretical calculations in frequency 
domain (Sec. IVC2|, the optimal frequency subdivision 
predicts AA c b/AAfb ~ 26 for a minimum match of 0.97. 
This calculation has assumed high match, and divides 
signals into parts that are strictly localized within sep- 
arate frequency bands. On the other hand, the numer- 
ical result using interpolation method on the IIR filter 
groups in the time domain (as prescribed in Sec. IV D) 
reveals that we can relax the coarse-bank srjacing up to 

meaning 



AA™ = 19845 s~ 5 / 3 (dashed curve in Fig. |4 



A^~25 
AA™ ~ " 



(102) 



This is in very good agreement with the idealized pre- 
diction. Fig. [4] shows in dashed line the numerical result 
of the match as function of AA for the interpolated IIR 
filtering method. 

We can now evaluate the total computational cost of 
the entire filtering-reconstruction process. For filtering, 
since we only have 



K = (A max - A mln )/(2AA i c 1 b K ) = 92 



(103) 



M = (A max - A min )/(2AA™) = 2281 



(104) 



while the reconstruction cost for each member is 
0.10 MFLOPS, the total cost for reconstruction (for 
members in the fine bank but not already in the 
coarse bank) is 228 MFLOP. Therefore the total cost 
for searching for Newtonian Chirps in initial LIGO is 
449 MFLOPS, or 0.5 GFLOPS. 

We carry out the same procedure for Advanced LIGO 
and ETb, with frequency division information listed in 
Table |III| and interpolation factor as well as break-down 



templates in the coarse bank [3D], and the cost for each 
full filtering is 2.4 MFLOPS (see Table Ink, the cost of 



of filtering and recombination costs listed in Table IV As 
we can read from Table |IV[ the computational power re- 
quired for a real-time search of Newtonian Chirps, using 
IIR filters and interpolation, in initial, Advanced LIGO 
and ET are 0.5 GFLOPS, 1.2 GFLOPS and 4.4 GFLOPS, 
respectively. The scaling of cost with / min is rather mild 
as expected, and the cost, even for ET, seems very man- 
ageable. 

In summary, it seems possible that to search for tens 
to hundreds of thousands of fine-bank templates for ad- 
vanced LIGO or ET, we can have the entire search 
done with a few desktop computers and fewer if other 
acceleration technique such as the Graphics Processing 
Unit [221 [53] can be adopted. While our result is based 
on the Newtonian chirp, this outcome should be applica- 
ble to Post-Newtonian (PN) cases. Note the low-latency 
pipeline LLOID with the FIR scheme in combination 
with downsampling and SVD technique [7J [TO] also pre- 
dicts manageable computing power for Advanced LIGO. 
MBTA method 6J, on the other hand, can already per- 
form network analysis to search for inspiral signals using 
PN waveforms with a few CPUs for the initial LIGO. 
How it scales with advanced detectors while maintaining 
low latency remains to be investigated (see also Sec.|V for 
a comparison of frequency vs time domain method) . The 
integration of the time-domain IIR filtering method with 
the infrastructure of the LLOID pipeline is currently un- 
der way. Preliminary result for the application of the IIR 
filterbank method to PN waveforms can be found in [21] 
and US]. 



V. TIME DOMAIN VS FREQUENCY DOMAIN 
APPROACH 



A. General consideration 

In terms of template interpolation, the ideas to divide 
the template into segments in the time or frequency do- 
main are equivalent in mathematics - both trying to rep- 
resent the template by the superposition of a complete 
basis of the continuous real-value function space on real 
axis. The functions in the basis are much simpler than 
the template, and thus easier to deal with. We can im- 
prove the computational efficiency by processing the ba- 
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92 
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70 


26069 
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0.083 
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TABLE IV: Break-down of total computational cost in MFLOPS in searching for Newtonian Chirps in initial LIGO, Advanced 
LIGO and ET, assuming interpolation for inspirals of 1-3 M§ individual masses. Here we list numbers of templates in both 
the fine (Ma,) and coarse banks (A4b), the computational cost for ea ch fu ll IIR chain (Cur, taken from Table |lT|) , as well as 
the recombination cost for each template (C reC omb, taken from Table III I. We then give the total IIR filtering cost (CjfR al ), 



the total recombination cost (C^olnb), an< l the grand total cost. We also list the ratio Ctotal/Ctotal, m which Ctotal represents 
computational cost for the full bank without using interpolation. 



sis functions first and then superpose them in the right 
way to get the result for a template. Given that the data 
we get from the detector is in the time domain, the ad- 
vantage of working in the time domain is that we can 
avoid procedures required to transform the data into fre- 
quency domain (e.g., data accumulation in Fourier trans- 
formation) and easily achieve low time latencies. On the 
other hand, working in frequency domain allows us to 
easily combine the algorithm with down sampling tech- 
nique and reduce the number of templates. 

The frequency-domain template interpolation tech- 
nique, e.g., that used in MBTA[H], usually uses Heavi- 
side function to cut the template. So the template can 
be superposed smoothly in the frequency domain while 
in the time domain the joint of different basis functions 
can be quite crude. This means that those methods with 
this technique could easily take advantages of working 
in the frequency domain, but not both in the time and 
frequency domain without substantial additional cost in 
computation. 

Our algorithm, with IIR filters working in the time do- 
main and template interpolation designed from the fre- 
quency domain, takes advantages of the benefits from 
both the time and frequency domain approach. Because 
we use a relatively smooth cut in both domains, we can 
both achieve low latency in the time domain and reduce 
the total number of templates while taking advantages of 
the down sampling technique. 



B. Comparison of computational efficiency 

When latencies of the analysis are not in concern, the 
frequency domain implementation of the cross correla- 
tion of data with templates (Eq. [IT]) is probably the most 
computationally efficient approach. This is due to the use 
of Fast Fourier Transform technique that has 0(N log N) 
operation count (N is the number of data points) as com- 
pared to the 0(N 2 ) operation count for the FIR method 
described previously. On the other hand, the operation 
count of the IIR interbank method is O(N) but multi- 
plied with a coefficient directly related to the possibly 
large number of filters needed to achieve a desired match 
to the chirp signal. Here we take latencies into consid- 
eration and compare the computational efficiency of the 
FFT-based method with the proposed IIR method. 



latency 



stretch 



FIG. 5: Analysis with overlapping data segments. The two 
horizontal lines represent two adjacent data stretches used for 
FFT. The lower data segment starts data accumulation with 
a delay of Ti a t en cy relative to the upper one. The duration of 
the overlap between the two stretches is that of the longest 
signal in the template bank (see text in Sec. VBl. 



To obtain low-latencies for the FFT-bascd matched fil- 



tering prescribed in Eq. (11), the most straightforward 
approach is to analyze data in overlapping segments. We 
consider the analysis of equal- length segments of duration 
^stretch as shown in Fig [5] with the duration of overlap 
equal to that of the longest signal, and the rest termed 

^latency, that is, 



-^stretch — ^longest ~t~ -^latency- 



(105) 



Here we assume the same strategy as in the current GW 
search pipeline where FFTs are performed with fixed 
length that accommodates the longest signal to ensure 
the coverage of signals of all possible duration. Note in 
practice, longer T strc tch might be needed to take into ac- 
count of the windowing effect of the FFTs and issues 
like the sharp notch filter problems due to lines in the 
noise power spectrum [2B]. For each data stretch, the 
output of Eq. ( [TT] ) has also the duration T strc tch, but due 
to the wrap-around effect of FFTs, only outputs (for sig- 
nals with ending time) within the last Ii aten cy are valid. 
This means that to obtain a valid output of duration 
Tiatency, a data stretch of at least ri onges t + ^latency needs 
to be processed. The requirement to perform filtering 
in real-time implies that the entire analysis needs to be 
completed within Ti atoncy seconds. The minimum total 
number of real multiplications and real additions for the 
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FFT algorithm is about 4N\og 2 N [27] 125 ] . Therefore 
the minimum computational cost in terms of FLOPS for 
each template for a real-time FFT-based matched filter- 
ing is at least, 



C 



FFT 



45 • T s tretch log 2 (S' • ^stretch) 



latency 



(106) 



where 5 is the data sampling rate. Here we assume a 
uniform sampling rate. 
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FIG. 



Computational cost as a function of Ti a t cnC y for a 



straightforward FFT analysis with overlapping data segments 
(solid line) and for the IIR filter method with downsampling 
technique ("+" symbols) and without ("x" symbols ) for real- 
time filtering with one template of a (1.4+1.4) Mq binary. 
The upper panel shows the cost for aLIGO and the lower one 
for ET. The dotted lines illustrate the equal cost between the 
FFT and IIR method and the corresponding latencies. The 
computational cost of the FFT method is calculated from 
Eq. ( |106| l with the longest template taken to be that of (1+1) 
Mq binary and sampling rate S=4096Hz. The IIR data are 
from Table[n| (column 15) (with downsampling) and Eq. ( 1091 
(without downsampling). 

In the FFT method, the actual delay Tdeiay between 
the end time of a GW signal and the event triggering 
depends on where the signal lies in the data stretch. The 
longest delay occurs when the ending time of a signal lies 
(^latency — dt) before the end of a data stretch where dt = 
1/5 is the sampling interval. In this case, after the signal 
ends, it takes the segment further (Ti atency — dt) time to 
finish accumulating data, and then another Ti ate ncy to be 
processed, resulting in a delay of, 



pFFT, worst 
delay 



271 



latency 



dt « 271 



latency 



(107) 



The shortest latency is achieved when the ending time 
of a signal lies just at the end of the data stretch, in which 
case the waiting time for the data to be analyzed is zero 



and the delay time of obtaining the trigger is simply the 
analysis time, 



7; 



FFT, best 
delay 



latency ■ 



(108) 



Therefore, for the FFT method, the delay time between 
the end of the signal and the event triggering is about 1— 
2 times of Ti atcncy . Although in previous LIGO inspiral 
search pipelines Tiatency is usually chosen so that adjacent 
data stretches are overlapped by 50%, it can be chosen so 
that Tiatency is much smaller, meaning data segments are 
analyzed with larger overlaps and higher computational 
cost. 

In comparison, for the IIR method, every new data 
point will be processed immediately when it is available. 
The delay time between the end of the signal and the 
triggering time can therefore in principle be as small as 
the data sampling interval. For real-time processing, the 
analysis time of the IIR filters at each time step should 
also be within one sampling interval, dt. As discussed 
previously, for each output of an IIR filter in (16 1, a to- 
tal of 12 floating point operations are needed. Hence 
to produce the IIR filter bank output in real time with- 
out downsampling for one template requires the floating 
point operation per unit time of 



and the delay 



Cf IR = 125 • N lm , 



T IIR _ jf 
-'delay — al - 



(109) 



(110) 



Here asterisk is used to indicate the computational cost 
of the IIR filter method without the downsampling tech- 
nique. 

Fig. (JsJ) shows the computational cost of one template 
for the FFT method as a function of Tiatency when search- 
ing for a GW from a (1.4+1.4) M NS-NS binary and 
its comparison to that of the IIR filter method with and 
without downsampling technique. It shows that the com- 
putational cost of the FFT based method increases as la- 
tencies decreases, the increase is particularly significant 
at latencies less than hundreds to thousands of seconds 
(Eq |106|), whereas IIR methods (Eq.d44k, Eq. HIM) 



have an inherent latency of the sampling interval (i.e 
not a function of latency) . It is clear that the IIR filter 
method presented in this paper has significant advantage 
over the FFT method in computational efficiency when 
low latencies are in demand. In particular, for Advanced 
LIGO, the IIR method can be much more efficient at la- 
tencies less than a few xlO 2 seconds. For the Einstein 
Telescope, IIR filter method can be much more efficient 
at latencies less than a few x 10 3 seconds. 

It should be mentioned that we compare only the core 
computational cost for the IIR and the FFT method for 
one template. We purposely leave out the cost of whiten- 
ing or the cost to take care of other FFT effect such as 
windowing effect as they are very much implementation- 
dependent. We also do not include template interpola- 
tion method for both methods as they are very much 
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implementation-dependent. In practice, both methods 
require that the raw data be conditioned, transported, 
pre-whitened before they are ready to be analyzed. These 
are expected to cause additional latencies on the order of 
tens of seconds. 



VI. CONCLUSION 

In this paper, we show that a time-domain search algo- 
rithm, with the flexibility of being able to detect a (non- 
precessing) compact-binary coalescence even before the 
final merger, is not only feasible for advanced and even 
future ground-based gravitational-wave detectors — but 
in fact can be realizable by a small number of state-of- 
the-art personal computers. 

In addition to employing the multi-rate technique for 
time-domain filtering, we have developed two additional 
key techniques in order to bring down the computational 
cost into the realm of feasibility: (i) the conversion of a 
chirp signal into a chain of IIR filters, and (ii) an algo- 
rithm that allows the reconstruction of filtering results 
of a finely spaced template bank from a much coarser 
bank, when each template in the coarse bank is divided 
into sub-templates. In order to illustrate the main tech- 
niques, we have restricted ourselves to the Newtonian 
Chirp, but it is rather straightforward to generalize our 
algorithms into post-Newtonian templates. 

Our main results on computational cost of the time- 
domain algorithm, for initial, advanced and future detec- 
tors, are summarized in Table [TV] With a simple compar- 
ison, we also conclude that our time-domain algorithm 
should require less computational resources than the con- 
ventional frequency-domain approach, when a short la- 
tency of less than hundreds to thousands of seconds is 
required — as shown in Fig. [6j 

Besides being computationally efficient at low (or even 
negative) latencies, the IIR filter bank method is also 
much simpler to implement than the FFT-based meth- 
ods, making it ideal for parallel computing, e.g., with 
Graphics Processing Units [22] . 

Two further ingredients must be added into the search 
pipeline before we can set up an early-warning system 
for EM follow-ups of compact binary coalescence: (1) a 
reliable veto strategy, and (2) an efficient algorithm for 



sky localization. The fact that our numerical results for 
IIR filter groups agree so well with frequency-domain an- 
alytical estimates (Sec. IV C ) assuming sharp divisions in 
frequency indicates that the sub-IIR-groups can be well- 
approximated as independent contributions to the SNR. 
This means a x 2 -like test that compares relative SNR 
contributions from filter subgroups to their expectations 
would be a promising veto strategy (see also [55] for other 
strategies that might be applicable for further efficiency 
improvement.) 

As for localization, we could in principle adopt the ex- 
isting algorithm already in place in the LIGO/VIRGO 
pipeline, which is based on coincidences of SNRs among 
multiple detectors. Alternatively, the fact that IIR fil- 
ters are separated in both time and frequency may pro- 
vide a possibility of developing a coherent search pipeline 
with feasible computational cost. The reason for the high 
number of templates in a coherent search is directly due 
to the multiplication of the high number of templates 
along the direction of mass parameters and the high num- 
ber of sky locations. However, as we divide each template 
into frequency segments, we find that in low frequencies, 
although there is a large number of cycles, and hence a 
requirement for a finer separation in mass parameters, 
the sky resolution of a detector network is low and there 
does not need a high number of sky patches; in high fre- 
quencies, we need a fine grid in the sky, but a coarse grid 
in mass parameters. As a consequence, we may need 
a much lower number of sub-templates are required for 
each frequency segment. This is currently being investi- 
gated. 
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